RDI System for Extrinsic Plagiarism Detection (RDI_RED), Working Notes for PANAraPlagDet at FIRE 2015
نویسندگان
چکیده
Extrinsic plagiarism detection gathered the attention of many researchers lately. Plagiarism process began to be more and more difficult to be detected due to appearance of other sophisticated plagiarism approaches other than direct copy and paste such as (phrase rephrasing, word shuffling, semantic substitution, etc...). In this paper, we present RDI system for extrinsic plagiarism detection (RDI_RED). RDI_RED system performs remarkably on a wide spectrum of plagiarism techniques starting from simple copy-paste to word shuffling and also complete sentence rephrasing. RDI_RED system achieved the first three positions in Arabic language plagiarism detection competition with a Plagdet (Plagiarism Detection score) of 80% which is 20% higher than the base line and 18% higher than the second best competing system.
منابع مشابه
RDI System for Intrinsic Plagiarism Detection (RDI_RID), Working Notes for PANAraPlagDet at FIRE 2015
Many researchers have been investigating the task of plagiarism detection lately. In this paper we present RDI system for intrinsic plagiarism detection (RDI_RID). RDI_RID system was the only system that participated in intrinsic track of the Arabic language plagiarism detection competition. RDI_RID system achieved a PlagDet (Plagiarism Detection score) of 19% compared to 38% achieved by the ba...
متن کاملArabic Plagiarism Detection Using Word Correlation in N-Grams with K-Overlapping Approach, Working Notes for PAN-AraPlagDet at FIRE 2015
This report explains our Arabic plagiarism detection system which we used to submit our run to AraPlagDetect competition at FIRE 2015. The system was constructed through four main stages. First is pre-processing which includes tokenisation and stop words removing. Second is retrieving a list of candidate documents for each suspicious document using K-gram fingerprinting and Jaccard coefficient....
متن کاملDeveloping Monolingual Persian Corpus for Extrinsic Plagiarism Detection Using Artificial Obfuscation: Notebook for PAN at CLEF 2015
The task of text alignment corpus construction at PAN 2015 competition consists of preparing a plagiarism corpus so that it can provide various obfuscation types and versatile obfuscation degrees. Meanwhile, its format and metadata structure should follow previous PAN plagiarism corpora. In this paper, we describe our approach for construction of a monolingual Persian plagiarism corpus that can...
متن کاملOverview of the AraPlagDet PAN@FIRE2015 Shared Task on Arabic Plagiarism Detection
AraPlagDet is the first shared task that addresses the evaluation of plagiarism detection methods for Arabic texts. It has two subtasks, namely external plagiarism detection and intrinsic plagiarism detection. A total of 8 runs have been submitted and tested on the standardized corpora developed for the track. This overview paper describes these evaluation corpora, discusses the participants’ m...
متن کاملNormalization based Stop-Word approach to Source Code Plagiarism Detection
This paper is a report of PES Institute of Technology’s participation in the Cross Language Detection of Source Code Reuse (CL-SOCO) task at FIRE 2015 [1]. We approach this task as text document plagiarism task, without considering formal programming language grammatical structure. We use normalization of commonly used identifiers to detect pair of programs which have the same objective. We als...
متن کامل